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A study of isoleucines in protein structures solved using X-ray 
crystallography revealed a series of systematic trends for the 
two side-chain torsion angles Xi and X2 dependent on the 
resolution, secondary structure and refinement software used. 
The average torsion angles for the nine rotamers were similar 
in high-resolution structures solved using either the 
REFMAC, CNS or PHENIX software. However, at low 
resolution these programs often refine towards somewhat 
different Xi and X2 values. Small systematic differences can be 
observed between refinement software that uses molecular 
dynamics-type energy terms (for example CNS) and software 
that does not use these terms (for example REFMAC). 
Detailing the standard torsion angles used in refinement 
software can improve the refinement of protein structures. The 
target values in the molecular dynamics-type energy functions 
can also be improved. 
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1. Introduction 

In 1991, Engh and Huber published their landmark article on 
bond lengths and bond angles (Engh & Huber, 1991). The 
parameters that they determined are used in nearly all of 
today's macromolecular software. Other authors have 
published parameters for planarities (Hooft, Sander et al, 
1996Z?; Sychrovsky et al, 2009; MacArthur & Thornton, 1996) 
and torsion angles (Wang et al, 2008; Jones & Thirup, 1986; 
Clore & Kuszewski, 2002; Butterfoss et al, 2005; Hooft et al, 
1997; Ponder & Richards, 1987; Dunbrack & Cohen, 1997; 
Lovell et al, 2000), while several groups have been working on 
the use of torsion angles in refinement software (Clore & 
Kuszewski, 2002; Berjanskii et al, 2006; Rice & Briinger, 1994; 
Briinger, 1992; Adams et al, 2010). The rotamericity of amino- 
acid side chains has often been studied for purposes such as 
homology modelling (Wang et al, 2008), X-ray and NMR 
refinement (Jones & Thirup, 1986; Clore & Kuszewski, 2002), 
structure validation (Laskowski, MacArthur et al, 1993; 
Laskowski. Moss et al, 1993; Hooft, Vriend et al, 1996; Lovell 
et al, 2003; Read et al, 2011) or the determination of MD 
force-field parameters (Lindorff-Larsen et al, 2010). MacAr- 
thur & Thornton (1999) realised that the average observed 
values for side-chain torsion angles are resolution-dependent. 
They concluded that this is caused by the fact that low-occu- 
pancy alternate side-chain conformations are often not 
observable at low resolution, and hypothesized that refine- 
ment of a single conformation in density that reflects multiple 
conformations leads to systematic torsion-angle deviations. 

Target values for parameters such as bond lengths and 
angles are similarly used as data in crystallographic refine- 
ment. Consequently, it can also be expected that their final 
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values will be closer to reality at high resolution, when a large 
amount of X-ray data are available, and closer to the initial 
target values at low resolution. If these target values can be 
improved, low-resolution models in particular will improve. 

Touw & Vriend (2010) showed, for example, that the ideal r 
angles depend on secondary structure and amino-acid type, 
and the values observed in PDB files additionally depend on 
the resolution and the refinement software used. We studied 
whether or not these factors similarly influence the observed 
isoleucine side-chain torsion angles. We conclude that deter- 
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mination of the optimal Xi and X2 values is a highly compli- 
cated task that the authors of different refinement programs 
have solved in different ways, all of which are open to 
improvement. Isoleucine was chosen for a series of reasons: it 
has two side-chain torsion angles (there are not sufficient data 
in the PDB to study residues with more variable torsion 
angles); it is an abundant residue; it is hydrophobic, which 
means that long-range interactions that are difficult to analyse 
do not come in play; and it is y^-branched, which means that 
strong interactions between the side chain and the backbone 
are involved in rotamer choice. 

2. Methods 

All PDB files that have been solved by X-ray diffraction, that 
contain at least one intact isoleucine and that were released 
before February 2013 were extracted from the PDBFinder 
(Hooft, Sander et al, 1996a) database and stored in a separate 
relational database (Touw & Vriend, 2010). A series of scripts 
were used to extract the data used in this study. The database 
and all of the software needed to maintain it are available 
upon request. 

Different data-selection protocols were used for different 
studies. Fig. 6 was based on all isoleucines in the database. 
Fig. 4 compares today's data on leucine with the data 
produced in 1999 by Mac Arthur and Thornton. For this study, 
we used the leucines in a PDB_SELECT (Hooft, Sander et al, 
I996Z7; http://swift.cmbi.ru.nl/gv/select/) data set of 21 667 
protein chains that had been solved by X-ray diffraction 
methods at a resolution better than 3.0 A and that had an R 
factor lower than 0.25. This data set contains no pairs of 



300 



— 180 



60 



> 



60 



Figure 1 



180 

x^ n 



300 



Two-dimensional histograms of the Xi and X2 dihedral angles of isoleucine in structures with resolution ranges of {d) 0.0-2.0 A and {h) 3.0-10.0 A. The 
data are scaled logarithmically from black (minimum) to white (maximum) and are divided into nine sections of 120 x 120°. The bin size is 1°. A total of 
558 287 isoleucines contributed to the high-resolution plot ia) and 90 891 contributed to the low-resolution plot (6). (c) The nine rotamers of isoleucine 
with X values of 60.0° {gauche^ or g^), 180.0° {trans or t) or 300.0° (gauche~ or g~). 
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sequences that are more than 90% identical upon sequence 
ahgnment. Figs. 1, 2, 5, 7(g), 7(/z), 8 and 9 are based on the 
1 765 734 isoleucines that fulfilled all of the criteria fisted in 
Table 1. 

Tfie database was augmented witfi a large number of 
computationally derived parameters, sucfi as torsion angles, 
secondary structure, (p and i/^, and residual WHAT_CHECK 
(Hooft, Vriend et al, 1996) quality parameters (summarized in 
Table 1). These parameters were obtained using the WHAT IF 
(Vriend, 1990) web service (Hekkelman etal., 2010) and DSSP 
(Kabsch & Sander, 1983; Joosten, te Beek et al, 2011). 

Electron densities from EDS (Kleywegt et al, 2004) were 
inspected using Coot (Emsley & Cowtan, 2004) and screen- 
shots were obtained using PyMOL (v. 1.2.0.1; Schrodinger) 
and YASARA (Krieger et al, 2002). 

Plots were generated using the statistical language R (R 
Development Core Team, 2012), employing the plyr 
(Wickham, 2011) package for data aggregation, and two- 
dimensional histograms were generated using Java and GIMP 
(Kimbafi et al, 1997). 

Tfie YASARA (Krieger et al, 2002) macro for Pytfion (Lutz, 
2001) was used to isolate tfie first a-fielix (Ser6-Cysl6) from 
crambin (PDB entry 3nir; Scfimidt et al, 2011) and to mutate 
all of its residues to alanines. Tfie fiydrogen positions in tfiis 
a-fielix were determined using tfie metfiod of Hooft and 
coworkers (Hooft, Sander et al., 1996c; Krieger et al, 2012). 
Tfiis a-fielix was energy-minimized in a simulation cell in 
vacuum using tfie NOVA (Krieger et al, 2002) force field, witfi 
nonperiodic boundaries, a 20 A cutoff for nonbonded inter- 
actions and YASARA' Simspeed parameter set to 'slow'. A 
sfiort 'steepest-descent minimization' was followed by simu- 
lated annealing (200 steps of 2 fs; at every tentfi step tfie atom 
velocities were reduced by 0.9) until tfie energy improvement 
was less tfian 0.05 kJ mol~^ per atom. After tfiis minimization, 
Alall was mutated to an isoleucine (/^ = 0, X2 = 0) wfiicfi was 
tfien rotated around its Xi and Xi in steps of 1.0° to obtain 360^ 
different conformations. For eacfi conformation, tfie total 
energy (in J mol~^) of tfie system and tfie atom experiencing 



Table 1 

Selection criteria for isoleucines. 
Selection parameter Criterion 



Experimental method X-ray 

Structure Not N-terminal or C-terminal, not next to 

a terminus and not next to Gly, Pro or 
a noncanonical residue 

Side-chain and backbone atoms Not mentioned in the WHA T_CHECK 

(Hooft, Vriend et al, 1996) quality 
report calculated with the WHA T IF 
web service (Hekkelman et al, 2010) t 

B factor, and backbone atoms <60.0 or <2.5 x the average B factor 

over 

DSSP secondary structure H, E or loop 



t This extensive list of criteria is briefly explained at http://swift.cmbi.ru.nl/gv/isoleucine/. 
The criteria include, for example, atomic clashes, spurious covalent bonds, missing atoms 
etc. 

tfie greatest force were stored and used to colour tfie 
In(energy) contours in tfie Xi^Xi energy plots. Atom distance 
contour plots were produced in a similar way. For tfiis plot, 
Hell was again rotated into 360^ different conformations. All 
metfiyl and metfiylene groups were replaced by pseudo-atoms 
(MB for tfie & of alanine and QG, MG and MD for 
and C^^ of isoleucine, respectively) located at tfieir centres of 
mass. For every conformation, tfie distances from QG, MG and 
MD of Hell to all otfier (pseudo) atoms (excluding 1-2 and 1- 
3 interactions) were calculated. Tfiese distances were 
corrected by tfie van der Waals radii of tfie atoms involved (H, 
1.20 A; C, 1.60 A; N, 1.50 A; O, 1.45 A; RM, 2.00 A; R2Q, 
2.00 A; Tsai, 2007). Tfie minimum resulting distance per 
conformation was plotted on a contour plot and tfie corre- 
sponding atom pairs were used as labels to colour tfiis plot. 
Tfie same two procedures were followed for tfie y^-fiairpin 
Vall3-Gln26 from tfie VP2 subunit of fiuman rfiino virus (PDB 
entry Ifirv; Rosenwirtfi et al, 1995), in wfiicfi Glyl9 (in tfie 
y^-turn) was not mutated and Ala23 was mutated back to 
isoleucine. 

Two-dimensional fiistograms of XilXi distributions were 
fitted against a two-dimensional Gaussian function to deter- 
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Figure 2 

Percentage of isoleucines in three sections as a function of secondary structure and resolution. The sections are as indicated in Fig. 1. Each dot represents 
the percentage of all isoleucines in a 0.1 A wide resolution bin that have a X1-X2 combination corresponding to that section, and the secondary structure 
as indicated by the colour. Only isoleucines that fulfil the quality criteria of Table 1 are represented. a-Helix, blue; yS-strand, red; loop, green. 
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mine the local maxima and two-dimensional spreads in each of 
the nine rotamer sections. 



3. Results 

The rotamers of isoleucine can be divided into nine sections, 
which are highly unevenly populated (see Fig. 1). This is 
mainly caused by strain owing to interactions between the 
and atoms of the side chain and atoms in the local back- 
bone. The low population of sections 1 and 2 became even 
more pronounced when we only analysed high-resolution 
structures (Fig. la). 

Fig. 2 shows the occurrence of X1-X2 combinations as a 
function of resolution and secondary structure. At higher 





Figure 3 

Two examples of rare isoleucine rotamers that were unambiguously 
observed in good electron density in structures determined at around 
1.0 A resolution. Left: Ile337 in chain A of the triacylglycerol lipase 
protein (PDB entry ld5t; Brzozowski et al, 2000) is observed in section 1. 
Right: Ilel5 in chain B of the HIV protease (PDB entry Ikzk; Reiling et 
al. , 2002) is observed in section 2. Electron densities were obtained from 
the EDS server and are contoured at lAa. At http://swift.cmbi.ru.nl/gv/ 
isoleucine/ the local structures of these residues are shown and the 
interactions that keep the residues in these unfavourable rotamers are 
explained. MolProbity (Chen et al, 2010) and WHAT_CHECK (Hooft, 
Vriend et al, 1996) call these two rotamers 'poor'. We believe that 
'improbable' is a better description. 
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Figure 4 

Mean value of the Xi gciuche~ rotamer as a function of resolution and secondary structure for 
leucine (independent of Xi)- {^) Values reconstructed from the study of MacArthur & Thornton 
(1999). (b) As in (a) but for a much larger data set (see §2) and subdivided for secondary structure. 
Colour coding is the same as in Fig. 2. Each dot represents at least 50 observations. Error bars 
represent the standard errors of the mean. 



resolution, we observe significantly lower populations in 
sections 1, 2, 7 and 9. Fig. 2 shows the relative frequencies of 
isoleucines in sections 1, 2 and 6 as a function of resolution and 
secondary structure. 

As illustrated in Fig. 2, there are around ten times fewer 
isoleucines in sections 1 and 2 at high resolution (~1.5 A) than 
at low resolution (~3.0 A). The population differences of 
sections 1 and 2 (see Figs. 1 and 2) suggest that the incidence of 
isoleucines in these sections could be zero if we look solely 
at structures solved at extremely high resolution. However, 
inspection of two examples (see Fig. 3) reveals that X1-X2 
torsion-angle combinations corresponding to sections 1 and 2 
do occur in high-resolution protein structures. We compared 
all isoleucines in the original PDB files with the corresponding 
isoleucines in the PDB_REDO databank (Joosten & Vriend, 
2007; Joosten, Joosten et al, 2011). We studied 951 631 
isoleucines that have all atomic B factors less than 80 A^ and 
all atomic occupancies of 1.0 in both databases. We observe 
that sections 1 and 2 are less populated in the PDB_REDO 
files. In most cases we also observe that Xi is different. A full 
list of observed differences (also as a function of secondary 
structure, in absolute and relative terms) is available from 
http://swift.cmbi.ru.nl/gv/isoleucine/. 

In 1999, MacArthur and Thornton studied the resolution- 
dependence of Xi for the 17 relevant amino-acid types 
(MacArthur & Thornton, 1999). At the time, they could only 
study Xi because they did not have sufficient data to study 
details such as multiple side-chain torsion angles or secondary 
structure. They did, however, show a detailed plot for the 
gauche~ angles of leucine as a function of resolution. Purely 
for reference, we compared this plot with the values that we 
obtained, split up into three secondary-structure types. Fig. 4 
shows the resolution-dependence of Xi in leucine as deter- 
mined by MacArthur and Thornton in 1999 and by our group 
in 2013. This plot deals with leucine rather than isoleucine 
because MacArthur and Thornton presented data for leucine 
but not for isoleucine. Fig. 4 confirms 
the previously observed resolution- 
dependence of the torsion angles, but it 
also shows that this dependence is more 
complicated than the near-linear beha- 
viour observed in 1999. 

Fig. 5 shows the average isoleucine 
side-chain torsion angles in the nine 
sections of Fig. 1 as a function of reso- 
lution and secondary structure. This 
figure shows that these values not only 
differ between different resolutions and 
secondary structures, but that the reso- 
lution dependence also differs between 
the nine sections. When comparing a 
single row or column in Fig. 5, the 
average value of one x often depends 
on the rotamer of the other x- 

It is tempting to look for trends in the 
18 panels of Fig. 5. For example, the 
mean value of the Xi gauche~ rotamer is 
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always largest for y^-strand and smallest for a-helix, indepen- 
dent of X2, while the mean value of the Xi trans rotamer is 
always smallest in y^-strand. The Xi gauche^ rotamer, however, 
does not show such trends. The average values in a loop often 
lie between the average values in a-helix and yS-strand, but in 
five of the 18 panels this is not the case. Xi is more strongly 
influenced by the secondary structure than X2, as might be 



expected from the smafler distance between Xi and the 
backbone compared with the distance between Xi and the 
backbone. The secondary structure has the least influence on 
X2 when it adopts a trans conformation, corresponding to 
the atom pointing away from the backbone. As expected, 
the combination Xi^ X2 = gauche^, trans shows the least 
secondary-structure influence on both Xi and X2- However, the 



Section 1 (g^, g ) 



23 ^3:0 
Resolution (A) 

Section 4 (g^ t) 



^3" 



in 

o 

in 
o 

CM 

o 
^ o 

^ in 



Section 2 (t, g") 



1 1 i > 



.111 



"2:0 23;;~ 

Resolution (A) 



Section 5 (t, t) 



o 

CO 

in 

CO 

o 

CO 

in 
o 
^co 

CO 

in 

O) 
CM 

o 

Oi 
CM 

in 

CO 
CM 



Section 3 (g , g 



5 



"To 1:5 2:0 2:5 o 3;( 

Resolution (A) 



Section 6 (g", t) 



"33" 




2.0 2.5 o 3.0 
Resolution (A) 



2.0 2.5 o 3.0 
Resolution (A) 



2.0 2.5 o 3.0 
Resolution (A) 



Figure 5 

The average side-chain torsion-angle values of fie as a function of resolution and secondary structure. Each dot represents the mean of at least 50 data 
points. Error bars represent the standard errors of the mean. The nine panels in {a) show Xi values. The nine panels in {b) show X2 values. Colours are the 
same as in Fig. 2. In both {a) and (6), the panels are placed in the same order as in Fig. 1. Consequently, similar torsion angles are oriented vertically in {a) 
and horizontally in {b). The resolution bin size is 0.2 A. 



Acta Cryst. (2014). D70, 1037-1049 



Berntsen & Vriend • Anomalies in the refinement of isoleucine 1041 



research papers 



differences in the mean values are only marginally 
significant. 

A series of studies on the dependence of the backbone 
angle r on resolution, secondary structure and the refinement 
software used (Touw & Vriend, 2010; Jaskolski et al, 2007; 
Karplus, 1996; Chakrabarti & Pal, 2001; Jiang et al, 2010; 
Lundgren & Niemi, 2012) revealed the importance of using 
the secondary structure as determined by DSSP (Kabsch & 
Sander, 1983; Joosten, te Beek et al, 2011), rather than using 
backbone torsion angles as a description of that secondary 



structure. The cooperative effects caused by the torsion-angle 
repeats and the repeat pattern of the hydrogen bonds cause 
systematic deviations of r angles from their most relaxed 
value. Fig. 6 is a variant of Fig. 5. In Fig. 6 only those residues 
in a a-helix or y^-strand are shown that have (p, x// angles in 
agreement with the secondary structure (Touw & Vriend, 
2010). The residues observed in loops according to DSSP 
(green in Figs. 2, 4 and 5) have been split in Fig. 6 into three 
categories, depending on whether their cp, x// angles correspond 
to those of an a-helix, a y^-strand or neither. 
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Figure 5 (continued) 
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In most panels, the loop residues with a-helix and y^-strand 
ip, \lf angles have more similar Xi and Xi angles than the 
residues actually in an a-helix and y^-strand according to 
DSSP. The differences between the Xi and Xi angles in the 
three subcategories of the loop residues nevertheless tend to 
be significant. In Figs. 5 and 6, section 9 (xi, X2 = gauche~, 



gauche^) shows the most extreme resolution-dependence. This 
section also stands out in Fig. 1. 

The results shown in Figs. 1-6 are the result of a complex 
interplay between a large number of attractive and repulsive 
forces between atoms in the side chain and those in the local 
backbone. The fact that the isoleucines are observed in whole 
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Figure 6 

The average side-chain torsion angles of isoleucine as a function of resolution, secondary structure and backbone dihedrals. Error bars represent the 
standard errors of the mean. The nine panels in (a) show Xi values. The nine panels in (b) show Xi values. In both (a) and (b), the panels are placed in the 
same order as in Fig. 1. Dark blue and red are for isoleucine in an a-helix or y3-strand, respectively, that have the corresponding backbone (p, x// angles. The 
lighter coloured blue, red and green data are isoleucines that are located in a loop according to DSSP, but that have (p, xj/ angles as in an a-helix, yS-strand 
or neither, respectively. Isoleucines without missing atoms were taken from virtually every suitable X-ray structure in the PDB to obtain acceptable 
counting statistics. The resolution bin size was 0.2 A, so that each dot represents the mean of at least 50 data points. 
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proteins, in which three-dimensional contacts with other 
residues occur, means that the plots are inevitably complicated 
by noise. Figs. 1-6 show that the number of local minima in the 
conformational space of isoleucine is limited. They also show 
that these minima are not typically observed with angles of 60, 
180 or 300°, as could be expected from a naive interpretation 
of sp^ hybridization of the atoms involved. Fig. 7 illustrates 
those forces that can contribute to this observation. 

For two typical backbone conformations (as in Figs, la and 
lb). Figs. 7(c) and l{d) show which atom feels the largest 



overall force when the two side-chain torsion angles are 
rotated in the entire conformational space in 360 x 360 steps 
of 1°. In general terms, we observed that the conformations 
that are most populated in experimentally determined protein 
structures (Fig. Ig and Ih) correspond to areas where the 
energy (as computed with YASARA using the YASARA 
NOVA force field; Krieger et al, 2002) is optimal. Neither 
protein structures nor force fields are perfect yet, so the 
computed local minima in X1-X2 space do not correspond 
perfectly with the observed maxima in the frequency plots. 
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However, the location and depth of the local minima in the 
energy plots correspond well enough with the maxima in the 
frequency plots to explain the observed frequency differences 
in the 2x9 sections in Fig. 7. For example, section 4 is rela- 
tively less populated in a-helix than in y0-strand. This is caused 
by the close proximity of the isoleucine atom to the H"" 
atom of the alanine one turn away in the a-helix. This contact 
does not exist in the strand situation. Repulsive forces 
between the isoleucine atom and the backbone O atom of 
the residue four positions earlier in the a-helix cause section 9 
in the a-helix to be little populated. Section 9 in the yS-strand, 
on the other hand, is intermediately populated. The difference 
in population of section 8 is fully explained by repulsive forces 



observed in the strand situation (with its own backbone H" 
atom and the N and N— H of the next residue) that are absent 
in the helix. The top half of the banana-shaped population 
distribution in section 9 falls in an area that is a local energy 
minimum in the YASARA NOVA force field. The bottom half 
of this banana shape is situated in an area that YASARA 
NOVA considers less favourable because of repulsive inter- 
actions between the atom and the C, O and N atoms of the 
peptide plane just before the isoleucine. 

Fig. 8 shows that the observed local optima in isoleucine's 
X1-X2 space do not agree with the optima derived from the 
YASARA NOVA (Krieger et al, 2002) force field in all 
sections. Obviously, this disagreement does not have any 




Figure 7 



{a, b) Energy-optimized starting situation of isoleucine modelled in a polyalanine helix {a) (xi, X2 = 293, 162°) and an antiparallel polyalanine strand {b) 
(Xi? X2 = 306, 170°). (c) and (d) show in colour for 360^ conformations which atom experiences the largest force using the YASARA NOVA force field for 
a-helix and y0-strand, respectively. Grey lines separate coloured areas. Black lines connect conformations with equal energy, (e) and (/) show for each 
conformation which pair of atoms shows the largest interatomic penetration for a-helix and /3-strand, respectively [1-3 interactions such as C"— or 
N— are not included in (e) and (/) because they are invariable; side-chain protons are not used for clarity]. Note that different colouring schemes are 
used in (c), (d), (e) and (/). (g) and (h) show the frequency distribution of isoleucines in the nine sections in structures solved at better than 2.0 A 
resolution for a-helix and yS-strand, respectively, (g) and (h) are otherwise similar to Fig. 1. 
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quantitative value because the YASARA NOVA force field 
was not used in the X-ray refinement protocols. However, 
when we produce plots such as that shown in Fig. 1 for X-ray 
structures solved at better than 2.0 A resolution and exclu- 
sively refined with either CNS (Briinger et al., 1998) or 
REFMAC (Murshudov et aL, 2011), we find that the local 
optima observed in the y^-strand are about 1° further away 
from YASARA' s unfavourable areas in the CNS structures 
than in the REFMAC structures. This is evident, for example, 
in sections 3 and 9, which contain the most observed isoleucine 
X1-X2 combinations in areas that YASARA considers unfa- 
vourable. 

Fig. 9 shows the behaviour of Xi and Xi as a function of the 
resolution and the refinement software used for isoleucines in 



an Ckf-helix in section 6 (xi, X2 — 300, 180°). While Xi shows 
hardly any significant differences, the Xi values for REFMAC 
and SHELXL (Sheldrick, 2008) show systematic deviations 
from the values observed for the programs that use molecular 
dynamics-based force fields [X-PLOR (Briinger, 1992), CNS 
(Briinger et al, 1998)] or other forms of torsion-angle restraint 
(PHENIX; Adams et al, 2010). 

4. Discussion 

Despite rapidly approaching the milestone of 100 000 entries, 
the PDB still does not contain sufficient data to allow us to 
make the detailed subset selections needed to answer some of 
the remaining questions. In future studies, we want to use 



Distance- Van der Waals radius (A) coloured by atom pair, ot-helix 
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Figure 8 

Isoleucines in an a-helix (left) or in a y0-strand (right). This figure was generated by superimposing Figs. 7(g) and 1(h) on Figs. 7(c) and 1(d), respectively. 
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Figure 9 

Xi (top) and Xi (bottom) as a function of the resolution and refinement software employed for isoleucines in an a-helix (left) and in a yS-strand (right). 
These Xi ^rid X2 values correspond to section 6, which is the most populated section in all cases (more than half of all isoleucines fall in this section). 
REFMAC (Murshudov et al, 2011; red) and SHELXL (Sheldrick, 2008; light blue) behave similarly and so do the packages that explicitly use molecular 
dynamics-based force fields (X-PLOR, Briinger, 1992, dark blue; CMS, Briinger et al, 1998, green). PHENIX (Adams et al, 2010) resuhs are in purple. 
Error bars represent the standard errors of the mean. 



culled data sets that contain only one copy of a particular 
series of similar structures that were solved by the same group 
using the same software. This approach was taken when 
selecting the data underlying Fig. 4, but the associated 
reduction in available data (a factor of 5-10) was unacceptably 
large. In many of the panels in Figs. 2, 4, 5, 6 and 9, dots are 



missing because we (arbitrarily) require each dot to represent 
at least 50 observations. A reduction in the number of PDB 
files by a factor of 5-10 would greatly increase the number of 
missing observations unless we correspondingly reduced the 
number of data points per dot (and thus increased the noise) 
in these figures. 
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It is practically impossible to determine how refinement 
programs and their sets of target values and/or force fields 
have changed over the years. When we split the data into two 
sets (structures solved before 2005 versus structures solved 
after 2004), we observe much smaller differences between the 
years than between secondary-structure elements. These 
data are available from the associated website (http:// 
swift.cmbi.ru.nl/gv/isoleucine/). 

The SCRWL rotamer library described by Dunbrack & 
Karplus (1993) uses the backbone torsion angles (p and i/^ as a 
basis for rotamer prediction. Touw & Vriend (2010) showed 
that the cooperative effect of secondary-structure elements 
causes r angles to be systematically different between residues 
in Qf-helix or )S-strand and residues in a loop that have the same 
backbone torsion angles as residues in an o^-helix or ^strand. 
Figs. 5 and 6 show that a similar effect is observed for the 
relationship between side-chain rotamer preference and 
backbone torsion angles. The larger number of interactions 
involved in the side-chain rotamer choices, illustrated in Figs. 7 
and 8, makes this effect less straightforward to explain for the 
isoleucine Xi, X2 angles than it was for r angles. 

The YASARA NOVA (Krieger et al, 2002) force-field set is 
based on the original Amber (Cornell et al, 1995) force-field 
set. Like other force-field sets such as CHARMM (MacKerell 
et al , 1998) or GROMACS (Hess et al, 2008), the Amber set is 
not complete. For example, it does not include terms such as 
induced polarization or higher order moments of aromatic 
planes. YASARA'^ force-field parameters have been exten- 
sively optimized to cope with these problems by minimizing 
the r.m.s. difference between high-resolution X-ray structures 
and homology models before and after short MD simulations. 
Consequently, the YASARA force field relies less on the 
preconceptions of basic physics and keeps proteins closer to 
reality than the original Amber force field. CNS (Briinger et 
al, 1998) and X-PLOR (Briinger, 1992) use energy force-field 
parameters that are somewhat similar to the values used by 
YASARA, while PHENIX (Adams, 2010) can use periodic 
torsion-angle restraints that perform a similar function. The 
resulting forces push the atoms in isoleucine side chains in 
several sections of the Xi, X2 plot slightly away from their real 
position. At a resolution of around 1.0 A this effect is negli- 
gible, because the X-ray data will decide where the atoms end 
up. However, at a resolution of 2 A this effect is already 
significant, and at lower resolutions we observe differences of 
up to 10°. Side-chain torsion angle rotations of a few degrees 
in short side chains such as isoleucine will not lead to biolo- 
gically significant errors in atomic positions. The errors 
enforced on the coordinates by the force field employed are 
nevertheless systematic in nature. It seems that refinement 
programs that use molecular dynamics-like energy parameters 
can be improved by adding more realistic, structure analysis- 
based, force-field terms similar to those incorporated into 
YASARA for the purpose of optimizing homology models 
(Krieger et al, 2002). 

To assist the developers of refinement software, we have 
determined the Xi^ X2 optima and their two-dimensional 
spreads for each section and each secondary-structure type in 



high-resolution structures. The loop residues have been split 
into three categories: loops with helical cp, xj/ angles, loops with 
strand cp, xlr angles and loops with other cp, xj/ angles. The 
resulting values show that it is better to use secondary- 
structure elements when a cy-helix or y^-strand is observed and 
(p, xl/ angles in loops. A revised approach to torsion-angle 
restraints, possibly including correlated torsion-angle 
restraints, might need to be introduced in refinement software 
packages to optimally use these database-derived potentials. 
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